Textractor: A Framework for Extracting Relevant Domain Concepts from Irregular Corporate Textual Datasets

نویسندگان

Ashwin Ittoo

Laura Maruster

Hans Wortmann

Gosse Bouma

چکیده

Various information extraction (IE) systems for corporate usage exist. However, none of them target the product development and/or customer service domain, despite significant application potentials and benefits. This domain also poses new scientific challenges, such as the lack of external knowledge resources, and irregularities like ungrammatical constructs in textual data, which compromise successful information extraction. To address these issues, we describe the development of Textractor; an application for accurately extracting relevant concepts from irregular textual narratives in datasets of product development and/or customer service organizations. The extracted information can subsequently be fed to a host of business intelligence activities. We present novel algorithms, combining both statistical and linguistic approaches, for the accurate discovery of relevant domain concepts from highly irregular/ungrammatical texts. Evaluations on real-life corporate data revealed that Textractor extracts domain concepts, realized as single or multi-word terms in ungrammatical texts, with high precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards A Semantic Tagger for Analysing Contents of Chinese Corporate Reports

In this paper, we report on an experiment in which we explore the feasibility of applying a semantic tagger for analysing the textual contents of Chinese corporate reports, focusing on the contents of corporate strategy. In recent years, Natural Language Processing (NLP) research has been giving increasing attention to automatic analysis of the textual contents of corporate reports using NLP ap...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Extracting Meronymy Relationships from Domain-Specific, Textual Corporate Databases

Various techniques for learning meronymy relationships from opendomain corpora exist. However, extracting meronymy relationships from domain-specific, textual corporate databases has been overlooked, despite numerous application opportunities particularly in domains like product development and/or customer service. These domains also pose new scientific challenges, such as the absence of elabor...

متن کامل

From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions

Learning ontologies requires the acquisition of relevant domain concepts and taxonomic, as well as non-taxonomic, relations. In this chapter, we present a methodology for automatic ontology enrichment and document annotation with concepts and relations of an existing domain core ontology. Natural language definitions from available glossaries in a given domain are processed and regular expressi...

متن کامل

Choosing appropriate theories for understanding hospital reporting of adverse drug events, a theoretical domains framework approach

Adverse drug events (ADEs) may cause serious injuries including death. Spontaneous reporting of ADEs plays a great role in detection and prevention of them, however, underreporting always exists. Although several interventions have been utilized to solve this problem, they are mainly based on experience and the rationale for choosing them has no theoretical base. The vast variety of behavioral ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Textractor: A Framework for Extracting Relevant Domain Concepts from Irregular Corporate Textual Datasets

نویسندگان

چکیده

منابع مشابه

Towards A Semantic Tagger for Analysing Contents of Chinese Corporate Reports

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

Extracting Meronymy Relationships from Domain-Specific, Textual Corporate Databases

From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions

Choosing appropriate theories for understanding hospital reporting of adverse drug events, a theoretical domains framework approach

عنوان ژورنال:

اشتراک گذاری